CELLPHONE PRICE RANGE PREDICTION CLASSIFICATION MODEL

A mobile phone, cell phone, cellphone, or hand phone, sometimes shortened to simply mobile, cell or just phone, is a portable telephone that can make and receive calls over a radio frequency link while the user is moving within a telephone service area.

The first handheld mobile phone was demonstrated by John F. Mitchell and Martin Cooper of Motorola in 1973, using a handset weighing c. 2 kilograms (4.4 lbs).

In 1979, Nippon Telegraph and Telephone (NTT) launched the world's first cellular network in Japan. In 1983, the DynaTAC 8000x was the first commercially available handheld mobile phone. From 1983 to 2014, worldwide mobile phone subscriptions grew to over seven billion—enough to provide one for every person on Earth.

In first quarter of 2016, the top smartphone developers worldwide were Samsung, Apple, and Huawei, and smartphone sales represented 78 percent of total mobile phone sales. For feature phones (or "dumbphones") as of 2016, the largest were Samsung, Nokia, and Alcatel.

source:https://en.wikipedia.org/wiki/Mobile_phone

Evolution of Mobile Phone

image.png

Import all necessary Libraries

Load Data

In the following table, first 6 entries are shown.

Profile Report of Data

Check Correlation of Data

INSIGHT

features have highly positive correlation.

Check Variance of Data

Battery Power- RAM and Price Range

Following scatter plot shows battery power and ram values according to price range.

Following Histogram and Violin plot shows ram data.

3G - 4G - Price Range

Following swarmplot plot shows relation between 3G according to Price Range and 4G according to Price Range distribution

Following Pie Plot show Percentage of mobile which has 3G and 4G or not

Check price Range is balanced or not

Following Bar shows n_cores (Number of cores of processor) distribution

Following Pie plot shows percentge of mobile which has Bluetooth or not

Dual Sim - Ram - Price Range

Following graph shows relation between Mobile which has dual sim or not with RAM accoring to price range

Following Pie plot shows the percentge of Mobile which has Dual Sim Support or not

Following Pie plot shows percentage of Mobile which has Wifi or not

Price Range - Battery

Following point plot shows the relation between Price range and Battery

Following Histogram and Violin plot shows the distribution of Battery data

Following Histogram and Violin plot shows the distribution of int_memory (Internal Memory) data

Following Violin plot shows the relation and distribution of Price range vary with RAM

Following Violin plot shows the relation and distribution of Price range vary with Battery Power

Following Histogram shows how price varies with px_height (Pixel Resolution Height)

Following Histogram show how price varies with px_width (Pixel Resolution Width)

Remove Target features from datasets for further modeling

Feature Scaling

Spliting data into Train test split

Modelling - Algorithms selection

For selecting which regression algorithm is best for model, used pycaret.

For using pycaret we transform split data into Data-frame

Train the model using these top 10 algorithms

Visualization for Training

Cross validation test for Accuracy score and Mean square error

Testing of model and Visualization

Calculating Accuracy score, Recall, Precision, F1 Score and Mean square error

Evaluation Matrix

Confusion matrix and Classification report

FEATURE SELECTION

SelectKBest : removes all but the highest scoring features

For classification generally these methods are used: chi2, f_classif, mutual_info_classif

In the graph above, we can see 5 or 6 features gives higher train accuracies.

These are the features has high correclationa "battery_power,int_memory,px_height,px_width,mobile_wt,ram"

Again we perform modeling with selected features and check all value

Visualization for Training model (With Selected Features)

Cross validation test for Accuracy score and Mean square error¶

Testing of model (With Selected Features) and Visualization

Calculating Accuracy score, Recall, Precision, F1 Score and Mean square error

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

The advantages of support vector machines are:

The disadvantages of support vector machines include:

We tuned SVM because it has lowest Accuracy in Training and Testing of model (without Feature selection) if it's Accuracy increased and upto mark than I am using SVM for prediction ready model

Tuning with the help of Grid search

StratifiedKFold: Provides train/test indices to split data in train/test sets.

Accuracy of Training and testing of SVM model after tuning the SVM Model

Conculasion

Before Feature selaction and Modeling done without Features Selection

Cross validation Accuracy

catboost = 0.943750, Logistic= 0.968750, LDA= 0.956250, QDA= 0.928125, Gradient Boosting= 0.912500, Xgboost= 0.918750, Lightgbm= 0.912500,Random Forest=0.896875,Decision Tree=0.850000,Support Vector=0.893750

Test Accuracy

catboost = 0.9425, Logistic= 0.9775, LDA= 0.9450, QDA= 0.9400, Gradient Boosting= 0.9050, Xgboost= 0.9025, Lightgbm= 0.9025,Random Forest=0.8875,Decision Tree=0.8275,Support Vector=0.8925

After Feature selection and Modeling done with These selected Features

Cross validation Accuracy

catboost = 0.950000, Logistic= 0.975000, LDA= 0.950000, QDA= 0.981250, Gradient Boosting= 0.915625, Xgboost= 0.934375, Lightgbm= 0.918750,Random Forest=0.937500,Decision Tree=0.887500,Support Vector=0.953125

Test Accuracy

catboost = 0.9500, Logistic= 0.9825, LDA= 0.9500, QDA= 0.9750, Gradient Boosting= 0.9050, Xgboost= 0.9050, Lightgbm= 0.9200,Random Forest=0.9200,Decision Tree=0.8450,Support Vector=0.9375

Final With SVM with Tuning and Feature selection

Train Accuracy: 0.975 and Test Accuracy: 0.98

Accuracy of model Incresed when we perfom Feature Selaction and again incresed when Tune our model

Create PKL file

We Train SVM For Cellphone Price Range Prediction.

image.png